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Abstract 


This essay proposes a novel 2 stages methods to segment color image and self-optimizing hyperpa- 
rameter in the model. On the first stage, we first preprocess the RGB color image to compound 
a multi-channel vector image, which could not only provide more spectral information but also 
avoid the highly correlated problem of RGB color space. Then SVM classifier is used to segment 
the multi-channels image while using grid search to find a optimized hyperparameter set for SVM 
model. This preliminary segmentation could be noisy and imprecise as the SVM classifier has the 
limitation of only take use of spectral information. On the second stage, we apply variation model 
to denoising and modifying the result we obtained on first stage by using spatial information to 
generate the final output result of segmentation. Experiments indicates that comparing to other 
advance segmentation methods, our 2 stage method has the advantages of high precision, widely 
applicable and closer to human vision. 
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1 Introduction 


Color image segmentation is an important part of image processing. It is widely used and is a foun- 
dation of image recognition and interpretation. Color image segmentation divides a disorganized 
picture into multiple regions with similar features that are easy to recognize. The feature can be 
its color, shape or texture[15]. Such a division can help the computer to understand a image like 
how mankind does, and to analysis it by algorithms. 

As most of earlier methods of color image segmentation are transformed from the methods of 
grey-scale images, some of the methods are able or have the potential to be used in different kinds of 
images. Let Q C R? be a bounded open set, meanwhile 2 C R? with d > 1 be a given vector-valued 
image. When d = 1, it is a gray-scale image. When d = 3, it is what we call RGB (red-green-blue) 
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color image. When d > 3, it can be the HSI (hyperspectral image), medical image or multi-spectral 
image. Unlike the general RGB color image only has 3 channels, HSI image compounded by dozens 
or even hundreds channels, which contains much more spectral information than a color image. 
Based on such characteristics, HSI image segment methods prefer to pay attention on spectral 
information as well as spatial information|10]. Inspired by HSI image segmentation methods, to 
obtain a better result, some color image segmentation methods would also lift their dimension to 
build up a vector valued image with more channel [5] [28] to provide classifier more spectral signal. 

Our main goal of RGB color image classification is to uniquely label each pixel of the image with 
ground-truth labels that artificially labelled. A great number of techniques has been developed for 
the classification in last two decades. For example, ANN (artificial neural network) [22], sparse 
representation [33] and SVMs (support vector machines), which is also the one our method mainly 
based on. The principle of SVMs is mapping the image into a high dimension space, then base 
on this dimension space form a optimal hyperplane to classify. Traditional pixel wise classifier, 
like SVMs, only take use of the spectral information but no spatial factors. Even though their 
pixel wise accuracy is passable, variety of limitation is unavoidable. Their performance are limited 
through the classification by the effect of interruption, the Hughes phenomenon which caused by 
the inequality between the number of dimensions and the number of training data[19], and noisy 
[26]. Denoising methods [24] [13] [12] could restore the noisy image by compounding the spatial 
information. 


Figure 1: An example of noisy segmentation result with only take use of SVM 


The Spectral-spatial classification methodologies is what people mainly focus on naturally. [10] 
proposed a simple and efficient way to segment HSI images by taking great utilize of both spectral 
and spatial information. This method consists of two stages. Firstly, by applying the SVM classifier 
to make classification based on spectral signal of each pixel of the image, it predict a probability 
map to each data point which illustrate the probability to each class. Secondly, an image denoising 
method based on convex optimization is used to denoise the probability map we obtained in step 
one, and make the result smoother. This denoising technique fully based on spatial information, 
which nicely compensates for the defects of SVM classifier in stage one. The training set of two 
stages, ground truth of the segmented image labelled artificially, keeps the same. 

For SVM classifier, hyper-parameter selection is also very important. Different parameter selec- 
tion will greatly affect the result of SVM classifier. Except from the number of dimension, another 
difference between color image segmentation and HSI segmentation is the image properties. Most 
of the HSI image are satellite remote sensing image, relatively similar in image property. There- 
fore, in the problem of parameter selection, a pair of optimized parameters or a fixed parameter 
list could be enough to get satisfied segment result of most images. It is in this way that [10] is 
adopted. However, for color images segmentation, different images have great differences in their 


properties. The optimal parameter selection of different images would vary greatly. An efficient 
hyperparameter optimization method is necessary for a high adaptability color image segmentation 
approach. 

Hyper-parameter optimization is to select a group of appropriate hyperparameters from the hy- 
perparameter space to balance the deviation and variance of the model, so as to improve the effect 
and performance of the model. common parameter adjustment methods includes manually set pa- 
rameters, random search [2], grid search [3], Bayesian optimization [1], Particle Swarm intelligence 
algorithm [16]. For our SVM classifier, two parameters, which is the penalty coefficient C and the 
insensitivity coefficient y (indicated as g) of the RBF kernel are needed to be optimized. Through 
experiments, we found that the optimal parameter selection basically falls in [2~°, 2°]. As we only 
have two parameters, and the range of parameter selection is still controllable. Traditional SVM 
parameter optimization method grid search is a proper way for hyper-parameter optimization. 

Grid search method is a simple, mature and high accuracy parameter optimization approach. 
It is widely used on SVM models. But the most basic grid search algorithm wastes a lot of 
computational power on many parameter combinations that don’t need to be tested at all. In order 
to reduce the computational cost, Chihjen Lin et al[18]. proposed a two-step grid search algorithm 
to optimize the parameters, that is, the grid search is divided into two steps: rough search and fine 
search. In 2006, Yukun Bao et al.[3] proposed to find the maximum error descent path in model 
training to optimize the grid search path, which achieved good results in time series prediction. 
[27] reduced the size of the training set through a certain mechanism to accelerate the efficiency of 
grid search. This is what our method based on. By search the 2 dimension hyperparametric plane 
though 2”, we can obtain a proper hyper-parameter set quickly. 

Our method first preprocess the RGB color image to lift their dimension and enrich the spectral 
information. Then in the first stage, utilize GS-SVM to build the classifier and optimize the hyper- 
parameters of the model, and segment the image based on only spectral information. The second 
stage smoothing and denoising the result of stage 1 by spatial information. Experiment results 
shows that our methods has a better performance than other multi-stages approaches for color 
image segmentation[5].. Meanwhile, adaptability of the method is excellent, without the need of 
pre-set any parameters, it has a great segmentation accuracy on both noise image and clear image. 

This paper is arranged as follow. Section 2 reviews SVM classifier, grid search hyper-parameter 
optimization and variational denoising methods. Section 3 illustrates our two stage GS-SVM and 
variation model segmentation method. Section 4 provides the experimental results to indicate its 
high performance and adaptability. Conclusion and future works are generated in section 5. 


2 Review 


2.1 SVM and v-SVC 


Support vector machine (SVM) is a generalized linear classifier, which can be used for classification 
and regression by taking use the convex optimization theory to obtain the best predictive accuracy 
while keeping away from overfitting the datasets. SVM can be consider as a system where use 
linear classifiers in a high-dimension feature space by a hypothesis space. SVMs aim to find a 
proper hyperplane that could maximizes the Euclidean distance between each pairs of data points. 
Ever since SVM was presented in last century, it was popular and rapidly developed as it is robust, 
effective, and efficient. 

We can start with a binary linear classifier with the form f(x) = wTa + b, which is shown on 
Figure 2. It is a straight line in 2-dims space, where w is the line’s normal, and b is the classify bias. 
Then, given 7 linear separable data x; divided into positive and negative categories y; = {—1,1}, 


we have f(x;) = w!2;+6 separates the data set for i = 1,..., N. Starting from w=0, by go through 
each point {x;, yi}. If x; is classified properly, we remain w the same. If x; is misclassified, then 
w+ w-—az;. Then we can have a w = a a,x; enable f(zx;) classify all the data points properly 
into 2 classes. The basics SVM is f (ai) = 0; aiyi(xi Tx) +0, where x; is the support vectors, which 
are defined by the closest points to f(2;). 


linearly separable data 


f(z) = YS aiyi(xi™x) ue e 


support sectors” 
Figure 2: An illustration of linear SVM 


The SVM in this style is what we called hard-margin SVM, which means that all the elements 
are separated into 2 classes strictly, with no misclassified is allowed in the result. Such a classifier 
could occur a problem that no robust and low adaptiveness. Therefore, we introduce the slack 
variable €; > 0, which allow misclassified to some extent. For 0 < € < 1 point is between margin 
and correct side of hyperplane. we accept it as a margin violation. For € > 1, it is defined as 
misclassified. With the slack variable, we can rewrite f(x) as 


N 
min wi? +CS°& 
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This is what we called soft margin classifier. It has a much stronger robust while some slight 
misclassifications also exist. It is to be notice that when €; = co this will becomes a hard margin 
SVM as above. 

However, when the data points are not separable, this method does not works. Instead, we need 
to map the data points to a higher dimension space then separate it into 2 classes by a hyperplane 
in this high dimension space. 

The v— SVC we use is a kind of powerful support vector classifier, by using the support vector 
v, people can set the maximum of error before training. The formula is: 


minw,b¢,0 9llw|l3 — vp + Hy Lint & 
subject to: y;(w-¢(x;) +6) > p—§&,i=1,2,...,m, (1) 
fp O4= 1, 2,.005 
n 2 0, 


Same as what mentioned above, the w € R” is the normal to the hyperplane and b € R is the bias, 
v € (0,1] is an maximum of training error. &; is the slack variable which changes the hard margin 
into soft margin and allows the error in training, the part p/ || w? | indicates the Euclidean distance 
between support vector and the classification hyperplane. 

After the mapping to the higher dimension space, the dimension of this SVC could be very 


high. To solve this optimization problem easier, we can introduce its Lagrangian dual form: 


1 
maXq —5 eS ajajyiyjK (xi, x;) 
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(2) 


We can use the quadratic programming method to obtain the optimal Lagrange multipliers of it. 
Then, we can have the function for each pixel x, which is 


g(x) =sign(f(x)), where f(x) = a agyil (x;,x) +6 (3) 
i=1 


According to Mercer’s Theorem, it is known that if the kernel K is a symmetric and convex 
function, then we can represent is as the inner product of two features, or write it as K(x,y) = 
(x) - @(y). With such representation, we only need to know the kernel function K instead of the 
feature maps. Here are some most famous kernel functions K that satisfy the restricted condition 
of Mercer’s Theorem: 


1. Linear Kernel Function: 
K (24,25) = a} 2; 
It is actually no transformation to the data points. 
2. Polynomial Kernel Function: 
KG = (ya! 2; + b)*. 
It is equivalent to a polynomial transformation. The y, b, d here are the hyperparameters. 


3. Gaussian Kernel Function: 
K (ai, 23) = exp (—7 lei — 2,17) 


Gaussian kernel function is very common in practical application, and is also the most widely 
used Radial Basis Function(RBF). This function is so popular not only for there is only one 
parameter to be optimize, more important reason is that a Gaussian kernel is equivalent to an 
expanded dot product of a lower dimension mapped to an infinite dimension. Gaussian kernel 
is applied on our SVM classifier, and the 7 here is the hyperparameter g going to optimize in 
the grid search approach. 


4. Sigmoid Kernels Function: 
Ks 07) = tanh (ya? 2; + b) 


where ¥y and r are hyperparameters. 


2.2 Parameter Optimization Based on Grid Search 
2.2.1 Cross Validation 


Cross validation (CV), is a common method to estimate the generalization ability of classification 
models in statistics. Its main idea is to randomly divide the data set into numerous subsets, one 


Figure 3: Sketch map of Gaussian kernel function 


as training subset while others as validation subset. A model is first trained by the training set, 
then the validation subsets is used to test the trained classifier model. After all the subsets has 
been used as training set and construct a model. The model with the smallest validation error is 
selected as the final model. Common cross-validation methods include hand-out cross validation 
cross-validation, leave-one-out cross validation[21] [31], and K-fold cross validation, which is what 
we use in our method. As Fig.4 shows, divide a data set into k sets randomly, or “folds”, each 
of them with about the same size. Then choose one of the sets as the test set. Fit the model on 
the rest of k-1 sets, where the remaining K-1 subsets were taken as training subsets to calculate 
the accuracy on the observations in the fold that was tested. The cross validation accuracy was 
calculated by iterate the process K times, then the SVM classifier with the highest accuracy was 
taken as our final classification model. 


Iteration | Iteration 2 Iteration 3 Iteration K 


Fold K Fold K Fold K ooo Fold K 


a Training data | Test data 


Figure 4: K-fold cross validation 


2.2.2. Grid Search Method 


The penalty coefficient C' and the Gaussian kernel parameter g in the SVM optimization function 
directly affect the performance of the classifier. For different training sets, the optimal parameters of 
the classifier usually change, especially for the training of small samples, the parameters are greatly 
affected by the randomness of the samples, which reduces the generalization ability of the classifier. 
In this report, Grid Search (GS) method is considered to find the optimal hyperparameters C and 
g of SVM. 

Grid is first divided within a given range in the c— g plane. Then all grid points in the entire 


grid are traversed for calculation, each grid point corresponds to a pair of c-g value. For fixed grid 
points, k-fold cross validation is applied to calculate the average cross validation accuracy of the 
data set, and the grid point with the greatest accuracy is taken as the optimal hyper-parameter set 
in our classifier. 


2.2.3 Color Space Selection 


For RGB color image, both texture and color are important information. However, effect of texture 
information on image segmentation is not obvious[23], the color information of image is selected 
as the feature space of the color image. RGB color space is the original expression of color image 
people mostly see, but a single RGB color space cannot overcome the influence of illumination and 
other factors on color, its R, G, B three channels are highly related, and is difficult to separate 
background and foreground. Moreover, different color space has different characteristics, here are 
some of them. 

HSV color space is a human visual system likely color space. It describe a color by hue, 
saturation and Intensity, which enable the segmentation effect more accord with human vision. 

YUV color space is an RGB like color model that originated in the transition period between 
black and white and color TV. Where Y represents brightness and UV can be combined to represent 
chroma. The human eye is less sensitive to chroma than to brightness. The main reason is that 
there are more rods in the retina than cones, whose job is to read brightness, and cones whose job 
is to read chroma. Therefore, the eye for brightness resolution than color resolution is a little more 
fine. Therefore, taking use of YUV color space, especially the Y component helps the segmentation 
result fits human vision further. 

CIE L * A * B * (CIELab) is the most complete color model commonly used to describe all 
colors visible to the human eye. For convenient, later we denote the CIELAB color space by Lab 
color space. International Commission on illumination (CIE) created the Lab color in 1931 based on 
international standard for color measurement, then was modified and officially renamed as CIELab 
in 1976. According to [25], Lab color space is visual uniformity, which means that the real color 
difference and the numerical difference by two colors are proportionally similar. Such a describe is 
also close to human’s recognition and contributes to color image segmentation a lot [8][14][29]. 

As it shown on Figure 5, different channels provides different spectral information. Compar- 
atively speaking, more different channels can provide more comprehensive information, and it is 
easier to achieve satisfactory segmentation effect. The results of the experiment also proved this 
point, see Table 1. Therefore, by combining proper multiple color channels, we can obtain a multi- 
channel vector-valued image with more spectral information. 

Our method combines RGB, YUV and Lab space together to compounds a 9-dimensions color 
space as features of color image. To provides more spectral information to spectral-based SVM 
classifier, and describe the features of the image more reasonably, so that the different parts can 
be more effectively segment. 


2.3 Denoising Methods 


Define 2 = {1,..., Ni} x {1,...,. No} be the data points, v is the noisy image and u is the restored 
one we need. Total Variation is one most widely known method for image denoising. It concerns 
the total variation term in the optimization model, which is the function || V- ||1. The role of TV 
term in image restoration and denoising is to maintain the smoothness of the image and eliminate 
the artifacts that may be brought by image restoration. However, the restored image could be too 
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Figure 5: Channels on different color spaces 


smooth, and in some complex images, the restored image may lose details. There are two methods 
available for improving and related to the denoising method we use [10]. 
Adding the h.o.t. (higher order term) is the first way [17]. We can minimize 


1 a2 
H(u) = gliv — ulla + aa/[Vull: + > IVall (4) 


where 5||v — ul|3 process the Gaussian noise, and aj||Vull; is the total variation part, %||Vul|3 is 
the h.o.t. we added, it helps u comes smoother. This function also shows an excellent performance 
dealing with complex noise situation (Gaussian, Gamma, Poisson)[7] [11] [6]. 

Another denoising method is [9], introduces a technique to smooth ||V-||; . When using second- 
order PDE smoothing the image, the ”staircase effect” occurs sometimes, that is, the gray scale 
in some areas after image processing is the same. The image looks like it’s made up of areas of 
varying brightness, and the silhouette is too sharp. In this method, 2 steps are used to restore 
the image polluted by impulse noise and avoiding the staircases effect. First, using the Adaptive 
Median Filter [20] to detect where the possible noise pixels are. Next, on the one hand it restore 
the noise pixels, on the other hands, it keeps other pixels remain the same through: 


F(a) = |lv—ulh + Siivull, (5) 


s.t. uly = viy, 


where Y are clear pixels, uly = (wi);c7, while 1 < a < 2. Comparing to other denoising method, 
this one has excellent performance, as it work perfectly even the image is polluted extremely by 
impulse noise ( like 90% ). 


Inspired by [10], our advance two stage method is proposed. On stage 1 of spectral processing, 
u— SVC is used to obtain the probability map to each class of each pixel. On stage 2, by combining 
two denoising methods mentioned before together, we can restore those pixels that are misclassified 
under the same constrain used in stage q, as their ground truth label is settled. 


3 Our Classification Method 


SVM is a high-accuracy classifier, but as the needed of robust and the exist of slack variable, 
mis-classify is unavoidable, and that is why the result is noisy. Naturally, we need to denoise the 
result by take use of the spatial information and correct the mis-classified pixels. Our method first 
transforms the RGB color space into our 9-channels color space ahead. Then the first stage take 
use of GS-SVM to generate a probability map to each pixel. The second stage use a spatial-based 
denoise method to restore the noises on the probability map. 


3.1 Image Preprocessing 


For human vision, the brain divides an image based on information such as luminance and chroma, 
which is our main goal. In traditional RGB color space, these three components are highly related. 
Fig. 5 (a)-(c) is an example, which show a RGB color image with 3 channels separately. It is easy 
to identify that based on these single color channels, making a meaning and accurate classification 
is difficult. As it mentioned in section 2, there are variety of color spaces with low correlation, 
such as HSI, YUV, Lab, HSI and CB. For the reason of low correlation, human vision and the 
performance of classification, we choose a combination of RGB, YUV and Lab to compound our 
9-channels color space. 

For YUV color space, component Y stands for luminance, or gray scale, while components U 
and V stand for chrominance, which describe the color and saturation of an image and specify 
the color of a pixel. For Lab color space, the three components L, a, b coordinates indicate the 
luminance of the color (L, L = 0 indicates black and L = 100 yields white), color’s position between 
red and green (a negative indicates green and a positive indicates magenta) and color’s position 
between blue and yellow (b negative indicates blue and a positive indicates yellow). 

We denote our input RGB color image needs to be classified as ¢. In this preprocessing approach, 
we aim to transform the RGB color space to our compound 9-channels color space, which helps the 
SVM classifier in stage 1 shows better results. Let @ be the Lab color space transformed by ¢, and 
c’ be the YUV color space transformation. As these color spaces have different range, 


@ € [0,255] eG e[0,100]) a & [16,235] 
@ € [0,255] ce € [-128,127] @& e [16,240] 
é3 € [0,255] ch € [-128,127]  € [16, 240] 


normalization of data is necessary. By re-scale é, @ and @ to 255[0,1]?. In this way, we can 
compound our new 9 channels vector value image ¢c* 


Ai ees} ee at, EDN BY as ae ay 9 
Cr (C1, €2, €3, C1, Co, €3, Cy, C, C3) € 255(0, 1] ; 


Cc is the input image to our SVM classifier in Stage 1. 


3.2 Stage 1: Segmentation by GS-SVM 
3.2.1 SVM Classifier 


As it mentioned in 2.1, SVMs are binary classifier, while the pixels need to be classified into many 
classes. Therefore, to make SVM a multi-classifier, the OAO (One-Against-One) strategy is efficient. 
It is similar to the handshake question in the elementary school Mathematical Olympiad. If there 
are n classes, we need ne) SVMs to make a decision against each two classes. In this method, we 
choice v-SVC [30] to combine with OAO strategy. In this experiment, polynomial kernel, Gaussian 
kernel and Sigmoid kernel are used respectively. The results verify that the segmentation result 
of Gaussian kernel function is the best, so the kernel function of support vector machine adopts 


Gaussian form, which is : 
2 
ees aes 
I (xi, xj) = exp (-EStt) 


It has a strong flexibility by change the parameter o. We can also represent the kernel function as 
k(x, z) = exp (—g * ||a — 2||*) (6) 


where the parameter g and the penalty coefficient C of SVM, which is 4 in v-SVC classifier are the 
hyper-parameter set we are going to optimize by the grid search approach. 


3.2.2. Grid Search the Optimal Hyper-parameters 


Define that the 9-channels vector image ¢ generated in 3.1, we have testing pixel x € [0, 255]°. 
Training labels are 1 percent of data points that chosen randomly from the ground truth image. 
Smaller amounts of training data are also possible, but in order to facilitate a large number of tests, 


we chose this method to determine. The decision function of SVM classifier is 
g(x) = sign(f(x)), where f(x) = $0 aiyiK (xi,x) +b 
i=l 


where the optimal solution is subject to penalty coefficient c, and g is the Gaussian kernel parameter. 
By these input sample space and the initial setting, we can train our SVM model by grid search 
together with k-fold cross validation (k = 5) approach. 

The c — g pairs will traverse all the points of the grid on the c — g hyper-parameter plane. 
Ever since our approach go through a parameter point S(c,g), an accuracy vector A = (c,g,a) 
that corresponds to this point will be denoted, where accuracy a is the classifier cross validation 
accuracy under this parameter set (c,g). After traverse all the grid points on the hyper-parameter 
plane, the approach will decide the vector A with highest accuracy a as the final model we use to 
estimate the probability map for each pixels. 


3.2.3 Estimate the Probability Map 


SVM with decision function f(x) is a binary classifier. In order to make it available for n (n > 
2) classes in ground truth image, we need to apply the OAO (One-Against-One) strategy. Such 
approach calculates pen decision functions f;,; to predict the probability p; that pixel x is 
located in class i (0 < i,j <n,i#/j). 


Then we can illustrate our prediction of Prob(y = iV j) as 


1 


1 + ePfig (x) +7 (7) 


Tj = 


10 


With 7;,;, we can predict a pixel-wise probability vector p = [p1,po,... Pelt by minimize the 


loss function: ; 
oe 2 
eS ye ata 
i=1 ji 
‘ (8) 
ste py 0, Wi, Vo p= 1. 
i=1 
Since this is a negative log function, by minimize it we can have the largest p. 
[32] provide a effective way to solve this minimize problem. We can rewrite (8) as: 


1 
min 2p! Qp = min =p’ Qp (9) 
P p 2 


where 


In this way, the problem can be transformed to a convex quadratic programming problem. p reach 


minimum iff V scalar b, 
e 0 
[eo l[e]-[3] 09 


Where e is a column vector of all one, b is the Lagrangian multiplier of the constrain in (8), and 
notice that the O at the upper right corner is a column zero vector. In our test, we use a powerful 
SVM toolbox LIBSVM to implement the calculation for p(x). 

The stage 1 finally generates a vector V with 3 components i, j, k, and V;,;,, indicates whether a 
test point (i, 7) belongs to class k, or we can write it as Vi,;; = p(#ij). Vij,, = 1 means that point 
(i, 3) belongs to class k, otherwise, if V;;,, = 0, it means that point (7,7) belongs to other classes. 


3.3 Stage 2: Denoising the Probability Map 


Here we have the predict vector V, which is actually a roughly” segmentation result obtained by 
spectral based SVM classifier. The second stage takes use of variation model to restore the noisy 
result by compounds spatial information. 

Define that vz = V:,.% denote the probability map with class k. As it mentioned in section 
2.3, by adding the second order term and smoothing the total variational function, our denoising 
method could be illustrated by the optimization function: 


es | 2 B2 2 
mins |/u—v =F Vull; + Vulls, 
a 9) I kllo G1 Ila 2 I Ils (11) 


s.t. uly = Valy, 
where Y is the training points set that is same as the first stage, and (;s are regularization parame- 
ters which could prevent from overfitting. This model is adapted from the traditional TV denoising 
approach, its performance and adaptability are good, while could avoid staircase effect to a great 
extent. 
To solve such optimization problem, we need to apply ADMM (alternating direction method of 
multiplier) [4]. Firstly, we can convert the function to: 


. i 2 Bo 2 
x |ju— Ss —||D 
min 5 [lu — valla + Alls|: + > [Dulla + ew (12) 


s.t. s= Du and w=u. 


11 


. D : 
where D is the discrete operator of gradient V, D = ( D. € Rx" | D, and Dy are its 
y 


horizontal component and vertical component, n is the amount of the pixels in the image. The wy 
here is a indicator function, 
cai ul T = vely 


lw =oco otherwise 


Then, by augmented Lagrange method, we can define 
1 Be Hl 
L(u,s,w, A) = 5 |lu— Vella + Pills + F |Dulla + tw + 5||Eu— g— A\l3, (13) 
here E, g, A are Lagrange multipliers, and yp > 0 is a constant parameter. 


Solving u and g by update only one variable at a time and fix the other two, alternating and 
repeating the updates. That is, for r = 1,2,3...... 
2 
, 


‘4 (14) 


ADMM approach provides a method to transform multiple optimization variable problems into 
single optimization variable problems. For the first step, it is a least squares problem. Therefore, 
we can find its solution as: 


stepL: ul") = argmin 4 3 |lu vell3 || Dull3 || Bu gf) — x) 
u 


step2: gt) = argmin ¢ (i||sl|1 + ew + 4 za Sane 


gs 
step 3 NOTDY = rn) = BEy@t) + g(t) 


ult) = (T+ BD D + ETE) (vi. + ET (2 + AM)) (15) 


We can see that is has periodic boundary, FFT algorithm could be used to accelerate the solution 


of it. 
} (16) 


For the second step, we can illustrate s and w with the style: 
: 17 
uy) 


We can introduce a soft thresholding method to solve (16) by break down the objective function of 
the optimization problem above, and solve for 7 independent form functions. 


s+) — argmin {sills + 5 Dut? —~s— 


qytrd) = argmin {ew + 5 Jerr? —w-— rs” 


[s+ | = sgn ((r];) - max {lt = 0} i=l,...,2n (18) 


a 


r= Dut) — nM” 
The solution of (17) is easy to see, which is: 
[viel; ifz7e YT 
[wir+)] = (19) 


fut) = rs” _ otherwise. 


Through all these process in section 3.3, we can generate a restore vote that u belongs to k-th 
class, which is kind of alike the probability map in last stage, and is indicated as a 3-dimensions 
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vector U..,,. By finding the largest value of Uj;,,, we can determine which class that a pixel (7, 7) 
belongs to, and this is our final classification approach. It can also be illustrated as: 


argmax Uj; 5 ¢ 
k 


Here we can see that on the first stage, we only assign on the k component of V;,;,,, which is a 
spectral component. On our second stage, we assigning on the 7 — j plane, which is a spatial plane. 
The process of our whole segmentation method is shown on Figure 6. 


Color image to be segmented 


Lifting the dimension of test pixels in 
color image 


Choosing the training point set randomly 
from ground thuth of the color image 


Training SVM classifier 


Grid search the optimal solution of 
hyper-parameter set 


ig 
generate a pixel-wise probability map by 
spe al-based /M classifie 


Denoising the segmentation result by 
adapted TV model with using spatial 
informatio! 


Getting the segmented image 


Figure 6: The process of whole segmentation method 


4 Experiment and Result Analysis 


In this section, we test our method on the Stanford Background Dataset, 2 synthesis images, and 
the Berkeley Segmentation Dataset, which are widely used for geometric and semantic approaches 
evaluating. To verify the performance of our method, we tested the segmentation results under 
different color spaces. By the way, we also made a comparison between our approach and a high 
performance method [5], which use a three stage SLaT approach to segment and denoise color 
image. 

Our tests mainly focus on Figure 7,includes 8 images, which are No.0000631, No.0002395, 
No.0101121, No.3002154, No. 4000066, No.5000196, No.6000075, and No.6000161 in the Stanford 
Background Dataset. 

The programming environment of this experiment is MATLAB R2020b, on Intel(R) Core(TM) 
i7-8750H CPU @ 2.20GHz with 16.0 GB RAM. All test images are JPG style with RGB color space 
initially. The noisy in noise image are all Gaussian noisy image and is added by MATLAB function 
imnoise. 
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(a) Street 


(e) Building (f) Palace 


Figure 7: 8 example images on the color spaces tests 


To better quantify the segmentation results, we define accuracy rate (AR) as the proportion of 
pixels whose classifier result is same as the ground truth of the image, and AR is represented as 


Ne 
AR = — x 100 
N, x % 


where N, is the number of pixels that classified correctly, and N; indicates total number of pixel 
in the input image. 


4.1 Effect of Different Color Spaces 


Figure 8-9 shows the segmentation result of different color space. (a) are 3 input images to be 
segmented from Stanford data set. (b) is the segmentation result without any transformation. We 
can see that for the first image, there is some misclassification at the boundary between sky and sea. 
For the second image, the cloud was classified mistakenly for a part of a cow. In the center of third 
image, we can see that the segmentation result for relatively complex images is not very ideal. (c) 
corresponds to the segmentation result of RGB+Lab color space. We can see that its effect is way 
better than only use RGB as color space, especially in the first image, the segmentation result is 
close to our space. (d) corresponds to our RGB+YUV-+Lab 9-channels compound color space. It is 
easy to identify from these segmentation result that our approach has a great improvement in color 
image segmentation. In the case of background interference, the target region can be accurately 
segmented, and the segmented image is closer to the human visual identification. 

Some other color spaces, like HSV color space and their combinations are also tested, AR 
information for all color space we considered is shown below. 
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) RGB color space ) RGB+Lab space ) Our 9 channels space 


(a) Input 


Figure 8: Comparison between different color spaces on example (1)-(4) 
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(a) Input ) RGB color space ) RGB+Lab color space ) Our 9 channels space 


Figure 9: Comparison between different color spaces on example (5)-(8) 
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Table 1: The table shows the AR of RGB, HSV, YUV, RGB+Lab (R+L) , RGB+HSV (R+H), 
RGB+HSV+YUV (RHY), and our used RGB+YUV-+Lab (RYL) color spaces. The numbers in 
bold are the highest AR values. 
Example RGB HSV YUV R+L R+H RHY RYL 
(1) Street 94.94 95.28 95.41 94.50 95.17 95.34 95.57 
(2) Ship 96.42 96.49 91.02 94.25 92.73 93.73 95.17 
(3) Boat 96.76 98.78 97.25 98.76 98.70 98.25 98.75 
(4) Cow 97.68 98.23 97.89 98.07 97.51 97.86 98.27 
(5) Building 98.49 98.10 98.18 98.17 98.08 98.30 98.57 
) 
) 
) 


(6) Palace 95.42 95.72 95.42 95.59 95.23 95.49 95.92 
(7) Field 98.33 97.71 98.17 98.19 97.80 98.05 98.59 
(8) Coast 97.32 97.46 98.02 98.11 97.15 97.98 98.61 
Average 96.92 97.22 96.42 96.96 96.55 96.88 97.43 


In the header, for the sake of simplicity, the RGB color space is denoted as R, HSV color space 
is denoted as H, YUV as Y and L means Lab color space. Example (1)-(8) here are NO.6000075, 
No.0002395, No.0101121, No.3002154, No.5000196, No.6000161, No.4000066, and 0000631 in the 
Stanford Background Dataset, as it shown on the figure 8-9. These images contain picture of ani- 
mals, portrait, natural scenery and photos of the city, some are simple while others are complicated, 
could show the performance of different color space to greatest extend. From the table we can see 
that our RYL 9-channels has highest AR in most of images. Otherwise, HSV and RGB+Lab are also 
passable, but as 9-channels RYL color space contains more spectral information, its segmentation 
result is smoother and more similar with human vision. 


4.2 Comparison Between Other Segmentation Methods and Segmentation On 
Noisy Image 


To better test the performance of our method, we also make comparison against other the most ad- 
vanced segmentation approach. According to [5], the Smoothing, Lifting and Thresholding (SLaT) 
method it purposed is significantly better performance than others. Therefore, we mainly makes 
more comparison between our method and the SLaT method. 

Figure 10 below illustrates the segmentation result of a synthetic image which contains 6 phases 
with 5 different colored circles and background. Input image is polluted by Gaussian noise with 
mean 0 and variance 0.04. 


(d) Our method 


(a) Input noise image 


(b) Ground truth 


(c) SLaT method 


Figure 10: Comparison between different segmentation methods. Here the SLaT method is on [5] 


Quantitative accuracy rate are shown in the table 2. From segmentation result and the table, 
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Table 2: AR of different methods on noise 6-phases image and our approach on clear image 
SLaT Method Our proposed method 
AR 99.51 99.94 


we can see that with 0.04 variance of Gaussian noise attached, our method is superior to the SLaT 
method in both visual effects and accuracy. 

We also make comparison on the images from Stanford Dataset. By some limitation of SLaT 
method, we can only evaluate these segmentation result visually, their results are shown on Figure 
11-12. 


(a) Input (b) SLaT Method (c) Our method 


Figure 11: Comparison against our method and the SLaT method on example (1)-(4). Here the 
SLaT method is the method purposed by [5] 
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(b) SLaT Method (c) Our Method 


(a) Input 


Figure 12: Comparison against our method and the SLaT method on example (1)-(4). Here the 
SLaT method is the method purposed by [5] 


We also tested the segment result for noise images. The noise we added on 8 tested images are 
all Gaussian noise, with mean 0 and variance 0.001. The segmentation result of our method and 
the SLaT method[5] are shown on Figure 13-14. The accuracy rate of our method’s segmentation 
result on noise images are illustrated on Table 3. 

It can be seen from the image results that our segmentation results for noisy images are con- 
sistent with clear images, and shows better performance than the SLaT method. Our method can 
better segment the object as a whole, and the visual effect is more consistent with the human eye’s 
vision. From the numerical point of view, our method can achieve high accuracy similar to that 
of clear images when applied to noisy images. Some noise results even shows higher accuracy than 
clear result. 


19 


Table 3: The table shows the accuracy rate (AR) of the segmentation result of clear and noise 
images by our adapted 2-stages method respectively. 
Example Clear Noise Example Clear Noise 
(1) Street 95.57 95.90 (5) Building 98.57 98.02 
(2) Ship 95.17 96.69 (6) Palace 95.92 95.51 
(3) Boat 98.75 98.68 (7) Field 98.59 97.88 
(4) Cow = 98.27 98.27 (8) Coast 98.61 97.02 


De el 
a 


(c) SLaT Method ) Our method 


(a) Noise Image (b) Ground Truth 


Figure 13: Comparison between different color spaces on example (1)-(4). Here the Ground Truth 


is the clear images. The Noise images are obtained through adding Gaussian noise with mean 0 
and variance 0.01 on clear images. 
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(a) Noise Image (b) Ground Truth (c) SLaT Method (d) Our Method 
Figure 14: Comparison between different color spaces on example (5)-(8). Here the Ground Truth 
is the clear images. The Noise images are obtained through adding Gaussian noise with mean 0 


and variance 0.01 on clear images. 
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5 Conclusion 


In this paper, we proposed an adapted 2 stage methods to segment color image. On the prepro- 
cessing approach, we lift the dimension of RGB color image by compounds YUV and Lab color 
space to provides more spectral information. Then at the first stage, we segment the input im- 
age by spectral-based SVM classifier, whose hyperparameter is optimized by grid search method. 
On second stage, a spatial-based variation model is used to denoising and smoothing the image 
obtained in the first stage. This new adapted two stages method has the ability to segment color 
image and noisy color image, and obtaining high accuracy segmentation result in line with hu- 
man vision. Furthermore, we don’t need to pre-set any hyperparameters, so this method is very 
versatile and can easily segment any color image. Experiment results indicates that comparing to 
other advance segmentation methods, our approach gives a better result both quantificationally 
and visually. Drawback of the method is that when the input color image is very noisy, it would 
take a long time to use SVM classifier to segment in the first stage. Further work can be carried 
out based on this problem. 
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